Improving speech recognition models with small samples for air traffic control systems
نویسندگان
چکیده
Abstract In the domain of air traffic control (ATC) systems, efforts to train a practical automatic speech recognition (ASR) model always faces problem small training samples since collection and annotation are expert- domain-dependent task. this work, novel approach based on pretraining transfer learning is proposed address issue, an improved end-to-end deep developed specific challenges ASR in ATC domain. An unsupervised strategy first learn representations from unlabeled for certain dataset. Specifically, masking applied improve diversity sample without losing their general patterns. Subsequently, fine-tune pretrained or other optimized baseline models finally achieves supervised By virtue common terminology used domain, task can be regarded as sub-domain adaption task, which transferred using joint corpus consisting new transcribed target This construction enriches size samples, important addressing issue corpus. addition, speed perturbation augment further quality Three real datasets validate strategies. The experimental results demonstrate that performance significantly all three datasets, with absolute character error rate only one-third achieved through training. applicability strategies approaches also validated.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملSemi-Supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic Control
Automatic Speech Recognition (ASR) can introduce higher levels of automation into Air Traffic Control (ATC), where spoken language is still the predominant form of communication. While ATC uses standard phraseology and a limited vocabulary, we need to adapt the speech recognition systems to local acoustic conditions and vocabularies at each airport to reach optimal performance. Due to continuou...
متن کاملImproving language models for radiology speech recognition
Speech recognition systems have become increasingly popular as a means to produce radiology reports, for reasons both of efficiency and of cost. However, the suboptimal recognition accuracy of these systems can affect the productivity of the radiologists creating the text reports. We analyzed a database of over two million de-identified radiology reports to determine the strongest determinants ...
متن کاملImproving Speech Recognition Using Episodic Models
In this paper, we propose a new technique using episodic models to improve the performance of the current state-of-the-art speech recognition systems. These models allow us to make use of meta data and environmental information such as speaker, gender, accent, noise conditions. We recognise that we can not entirely abandon HMMs which are very powerful and highly scalable models. Hence, we propo...
متن کاملimproving the performance of mfcc for persian robust speech recognition
the mel frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. in this paper to achieve a satisfactorily performance in automatic speech recognition (asr) applications we introduce a noise robust new set of mfcc vector estimated through following steps. first, spectral mean normalization is a pre-processing which applies to t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Neurocomputing
سال: 2021
ISSN: ['0925-2312', '1872-8286']
DOI: https://doi.org/10.1016/j.neucom.2020.08.092